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DETAILED ACTION 

Claim Rejections - 35 USC §112 

The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his Invention. 

Claims 1, 2, 4 to 14, and 16 to 26 are rejected under 35 U.S.C, 112, second 
paragraph, as being indefinite for failing to particularly point out and distinctly claim the 
subject matter which applicant regards as the invention. 

Regarding independent claims 1 and 10, as amended, the term "necessarily" 
raises issues as to indefiniteness under 35 U.S.C. §112, Second Paragraph, as there 
are examples provided in the Specification where the audio feedback does not 
necessarily reflect the spotted words. Examples are dialogue turns 2, 5, 8, and 9, on 
Pages 14 to 15 of the Specification, where the confirmation message does not echo 
commands for "delete all", "correction", "repeat", and "send". Instead, a corresponding 
jingle is played, or the reply is a repetition of a prior utterance, or a response is 
"searching database". Also, in dialogue turn 4, the confirmation message 
misrecognizes the user input, and the audio feedback does not necessarily reflect the 
spotted words in the input utterance because the input utterance is misrecognized. 
Thus, the scope and definiteness of "necessarily" is not clear. 

Regarding new claims 23 and 25, there are issues of indefiniteness with respect 
to several terms, which are not provided support by the Specification. The terms "tightly 
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coupled", "instant feedback", and "immediate opportunity" are not defined by the 
Specification. One having ordinary skill in the art would not know how "tightly" the 
dialogue must be coupled to meet the scope of the claims. Nor would one skilled in the 
art understand what constitutes "instant" feedback or an "immediate" opportunity to 
correct recognition errors. These terms are not defined expressly by the Specification, 
and one skilled in the art would not know what scope to accord them. Thus, these 
terms are indefinite. 

Claim Rejections - 35 USC § 102 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 
A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a 
foreign country or in public use or on sale in this country, nnore than one year 
prior to the date of application for patent in the United States. 

Claims 1, 2, 6, 8 to 10, 13, 14, 17, 19, 20, and 23 to 26 are rejected under 35 

U.S.C. 102(b) as being anticipated by Takebayashi at ai 

Regarding independent claim 1 , Takebayashi et aL discloses a method of data 

entry by voice, comprising: 

"communicating an input utterance from a speaker to a speech recognition 

means" - the speech understanding unit 1 1 includes a speech recognition 

device for recognizing words or sentences in the input speech, and is capable 

of extracting a semantic content intended to be expressed in the input speech 
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by analyzing the input speech, in a form of a semantic utterance representation 
(column 6, lines 48 to 54: Figure 1); 

"spotting a plurality of spotted words of at least two recognized spoken words 
within the input utterance, wherein the spotted words form a phrase containing at least 
one of field-specifip values and commands" - keyword detection unit 21 (column 8, line 
55 to column 9, line 22: Figure 2); keywords are received in a word lattice or frame 
format ("field-specific values"), e.g. "three" "hamburgers" (column 1 0, lines 6 to 1 7: 
Figure 4); keywords include commands such as "order", "cancellation", and 
"replacement" commands (column 10, lines 18 to 24: Figure 5); 

"echoing at least one of recognized values and commands back to the speaker 
via a text-to-speech system, wherein audio feedback echoing at least one of recognized 
values and recognized commands is performed upon interpretation of each input 
utterance, and a sequence of the recognized values echoed in the audio feedback 
necessarily reflects a sequence of the spotted words within the input utterance" - 
response generation unit 13 (column 7, lines 23 to 43: Figure 2; colunnn 17, lines 61 to 
65); the multimodal response output generated such that the speech response for the 
confirmation message of "Your orders are one hamburger, two coffees, and four large 
colas, right?" is outputted from the loudspeaker unit 15 (column 13, lines 41 to 50: 
Figure 12C); an order contains both "values" and "commands", as the values are the 
numbers and types of each item ordered, and the order is a command to provide the 
items ordered; for a situation in which one hamburger and one cola has already been 
ordered, the confirmation is "Your orders are one hamburger and one cola, right?" 
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(column 22, lines 18 to 25: Figure 306); order confirmation messages are "audio 
feedback echoing" values and commands; for an order of one hamburger and one cola, 
the audio confirmation message "necessarily reflects a sequence of the spotted words 
in the input utterance" by identifying a quantity associated with each item ordered; 

"rejecting unreliable or unsafe input for which a confidence measure is found to 
be low" - (column 13, lines 6 to 10; column 20, lines 35 to 58; column 24, lines 18 to 44; 
column 25, lines 24 to 37); 

"maintaining a dialogue history enabling editing operations and correction 
operations on all active fields" - (column 6, lines 50 to 57); editing operations and 
correction operations include "addition", "cancellation", and "replacement" (column 10, 
lines 18 to 24: Figure 5). 

Regarding independent claim 10, Takebayashi et aL discloses an article of 
manufacture for data entry by voice, comprising: 

"an operating system" - processing unit 291 contains an operating system 
(column 29, lines 49 to 56: Figure 45); 

"a memory in communication with said operating system" - memory 292 (column 
29, lines 29 to 56: Figure 45); 

"a speech recognition means in communication with said operating system" - 
speech understanding unit 11 (column 6, lines 44 to 50: Figure 1); 

"a speech generation means in communication with said operating system" - 
response generation unit 13 (column 7, lines 23 to 43: Figure 1 ); 



Application/Control Number: 09/921 ,766 Page 6 

Art Unit: 2654 

"a dialogue history maintenance means in communication with said operating 
system" - (column 6, lines 50 to 57); 

"wherein said operating system manages said memory, said speech recognition 
means, said speech generation means, and said dialogue history maintenance means 
in a manner permitting the user to monitor speech recognition of an input utterance by 
means of a generated speech corresponding to at least one of field-specific values and 
commands contained within the phrase formed by spotted words within the input 
utterance, and to perform editing operations and correction operations on all active 
fields, wherein audio feedback echoing at least one of recognized values and 
recognized commands is performed upon interpretation of each input utterance, and a 
sequence of the recognized values echoed in the audio feedback necessarily reflects a 
sequence of the spotted words within the input utterance" - keyword detection unit 21 
(column 8, line 55 to column 9, line 22: Figure 2); keywords are received in a word 
lattice or frame format ("field-specific values"), e.g. "three" "hamburgers" (column 10, 
lines 6 to 17: Figure 4); keywords include commands such as "order", "cancellation", 
and "replacement" commands (column 10, lines 18 to 24: Figure 5; column 6, lines 50 to 
57); the multimodal response output generated such that the speech response for the 
confirmation message of "Your orders are one hamburger, two coffees, and four large 
colas, right?" is outputted from the loudspeaker unit 15 (column 13, lines 41 to 50: 
Figure 12C); an order contains both "values" and "commands", as the values are the 
numbers and types of each item ordered, and the order is a command to provide the 
items ordered; for a situation in which one hamburger and one cola has already been 
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ordered, the confirmation is "Your orders are one hamburger and one cola, right?" 
(column 22, lines 18 to 25: Figure 306); order confirmation messages are "audio 
feedback echoing" values and commands; for an order of one hamburger and one cola, 
the audio confirmation message "necessarily reflects a sequence of the spotted words 
in the input utterance" by identifying a quantity with each item ordered. 

Regarding claims 2 and 14, syntactic and semantic analysis unit 21 determines 
keywords by semantics (column 6, lines 44 to 50; column 9, lines 38 to 50). 

Regarding claims 6, 9, 17, and 20, correction commands include "cancellation" 
commands for deletion of a last entry, e.g. "That's Wrong" and "Cancel" (column 10, 
lines 18 to 24: Figure 5) and deletion confirmation (Figure 15B). 

Regarding claims 8 and 19, editing operations include "replacement" commands 
"Rather" and "Instead" (column 10, lines 18 to 24: Figure 5) and replacennent 
confirmation (Figure 15B). 

Regarding claim 13, response generation unit 13 generates the speech response 
in a synthesized voice (column 7, lines 23 to 43: Figure 1). 

Regarding claims 23 and 25, Ta/cebayas/?/ ef a/, discloses a keyword lattice and 
frame format entries for filling in the blank of "each uttered block of text"; orders are 
confirmed so that corrections to orders can be made ("affording the speaker an 
immediate opportunity to correct any recognition errors") (column 14, line 9 to 20; 
column 18, lines 21 to 52); dialogue management unit 12 provides "a dialogue model" 
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for providing feedback through response generation unit 13 (column 6, lines 55 to 62: 
Figure 1). 

Regarding claims 24 and 26, Takebayashi et aL discloses speech understanding 
unit 11 provides for speech recognition and passes to dialogue management unit 12 
(column 6, lines 48 to 62: Figure 1 ). 



Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed 
or described as set forth in section 102 of this title, if the differences between the 
subject matter sought to be patented and the prior art are such that the subject 
matter as a whole would have been obvious at the time the invention was made 
to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was 
made. 

Claims 4, 5, 11, 12, and 16 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Takebayashi et ai in view of LaRue. 

Concerning claims 4 and 16, Takebayashi et aL omits automatic adaptation after 
a form is filled in and sent for search in a database. However, it is generally well known 
to provide adaptation to a user's voice for a voice recognition system during downtime 
once a processing session is completed. LaRue teaches automatic adaptation of a 
word recognition procedure to individual users. (Column 3, Lines 39 to 42; Column 10, 
Lines 64 to 67; Column 13, Lines 28 to 30) It would have been obvious to one having 
ordinary skill in the art to perform automatic adaptation as suggested by LaRue after 
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conclusion of an ordering session in Takebayashi et al. for the purpose of adapting a 
voice of an individual user wlien the processor is not active. 

Concerning clainns 5, 1 1, and 12, Takebayashi et aL onnits a backup input systenn 
as a keyboard or touch screen. However, LaRue teaches a speech recognition system 
including a keyboard and an input panel 36 to enhance the ability to communicate 
audibly in a man-machine interaction. (Column 1 , Lines 19 to 27: Column 4, Lines 36 to 
39; Column 13, Lines 62 to 63: Figure 2) Including an additional input device in a 
speech recognition system is generally well known for the purpose of providing flexibility 
by permitting a plurality of modes of input or when one input device fails to operate. It 
would have been obvious to one having ordinary skill in the art to include a backup input 
system as a keyboard or input panel as taught by LaRue in the human-computer 
interaction system of Takebayashi et al, to improve and enhance the flexibility of a man- 
machine interaction for a speech recognition system. 

Claims 7 and 18 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Takebayashi et aL in view of Cornelison, 

— Takebayashi et a/, omits letters and numbers for a license plate as field-specific 
values. However, Cornelison teaches a parking ticket enforcement system allowing for 
the search of license plates by key words of letters and numbers through voice input 
from a police officer. (Column 7, Line 1 1 to Column 8, Line 39) This is desirable to 
provide a police officer on duty the capability of conveniently and effectively determining 
whether or not an observed vehicle has been associated with criminal activity. (Column 
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1 , Lines 39 to 48) It would have been obvious to one having ordinary skill in the art to 
apply the word lattice and frame format in the voice data entry of Takebayashi et al. to 
recognize letters and numbers of a license plate as taught by Cornelison for the 
purpose of providing a police officer on duty the capability of conveniently and 
effectively determining whether or not an observed vehicle has been associated with 
criminal activity. 

Claims 21 and 22 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Takebayashi et aL in view of Richards. 

Takebayashi et aL omits full duplex dialogue interaction with speech recognition 
and auditory feedback. However, full duplex interaction is well known for interactive 
voice response (IVR) systems, generally. Particularly, Richards teaches a sound card 
for analogous art game software, where the sound engine is capable of running in a full 
duplex mode to generate sound while concurrently receiving spoken utterances. 
(Column 6, Lines 39 to 56: Figure IB) It is suggested that full duplex capability provides 
greater flexibility for interactive voice response (IVR) systems so that a user need not 
wait for the system to cease generating sound before the user begins to talk. It would 
have been obvious to one having ordinary skill in the art to utilize full duplex dialogue 
interaction with speech recognition and auditory feedback as suggested by Richards in 
the speech dialogue system of Takebayashi et al. for the known purpose of providing 
greater flexibility for interactive voice response (IVR) systems. 
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Response to Arguments 

Applicants' arguments filed 21 June 2004 have been fully considered but they 
are not persuasive. 

Regarding independent claims 1 and 10, Applicants argue Takebayashi et al. 
fails to anticipate the limitation of audio feedback necessarily reflecting a sequence of 
spotted keywords. Applicants maintain feedback does not reflect the sequence of 
spotted keywords in every case for Takebayashi et al. This position is traversed. 

Firstly, the term "necessarily" of independent claims 1 and 10 raises issues of 
indefiniteness under 35 U.S.C. §112, Second Paragraph. There are examples provided 
in the Specification where the audio feedback does not necessarily reflect the sequence 
of the spotted words. Examples are dialogue turns 2, 5, 8, and 9, on Pages 14 to 15 of 
the Specification, where the confirmation message does not echo commands for "delete 
all", "correction", "repeat", and "send". Instead, a corresponding jingle is played, or the 
reply is a repetition of a prior utterance, or a response is "searching database". Also, in 
dialogue turn 4, the confirmation message misrecognizes the user input, and the audio 
feedback does not necessarily reflect the spotted words in-the input utterance because 
the input utterance is misrecognized. Thus, the scope and definiteness of "necessarily" 
is not clear given the examples provided by Applicants' Specification. 

Secondly, there are at least some examples where the feedback necessarily 
reflects the sequence of the spotted keywords in Takebayashi et al. Specifically, the 
feedback necessarily reflects the sequence of spotted keywords for any original order of 
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Takebayashi et aL For any original order, a user places an order and the systenn 
echoes back the order. At Column 13, Lines 36 to 50, the user orders one hamburger, 
two coffees, and four large colas; the system echoes back an audio response, saying 
"Your orders are one hamburger, two coffees, and four colas, right?" See Figures 12A 
to 12C. Each original order necessarily reflects the sequence of spotted keywords 
between the quantity of the item and the item ordered in the echoed audio speech 
response. Similarly, Column 22, Lines 4 to 25, says a user's original order is one 
hamburger and one cola; the system echoes back a response, saying "Your orders are 
one hamburger and one cola, right?" See Figures 30A and 30B. Finally, Figures 16, 
18, 21 , and 22A to 22D of Takebayashi et al. provide for partial confirmation of orders 
and for one by one confirmation. One by one confirmation provides confirmation of 
each item ordered, so that when an order includes two hamburgers, the system 
confirms, "Let me confirm one by one. You want two hamburgers, right?" One by one 
confirmation preserves the sequence of the quantity "two" and the item "hamburgers". 
Thus, Takebayashi et al. discloses a number of embodiments where feedback 
necessarily reflects a sequence of spotted keywords. 

Regarding claims 21 and 22, Applicants argue neither Takebayashi et al, nor 
LaRue discloses "providing a full duplex dialogue interaction including speech 
recognition and passive, auditory feedback." 

However, it is maintained Applicants have overlooked the fact that the rejection 
of claims 21 and 22 under 35 U.S.C. §1 03(a) is obviousness over Takebayashi et aL in 
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view of Richards. It is Richards that is cited for teaching the feature of full duplex 
dialogue interaction, not Takebayashi et al. or LaRue. 

Regarding claims 23 to 26, newly added, Tal<ebayashi et al. anticipates these 
claims. Moreover, these claims raise new issues of indefiniteness. 

Therefore, the rejections of claims 1 , 2, 4 to 14, and 1 6 to 26 under 35 
U.S.C. 112, 2"*^ U; of claims 1, 2, 6, 8 to 10, 13, 14, 17, 19,20, and 23 to 26 under 35 
U.S.C. 102(b) as being anticipated by Tal<ebayashi et al.; of claims 4, 5, 11,12, and 16 
under 35 U.S.C. 103(a) as being unpatentable over Takebayashi et al. In view of 
LaRue; of claims 7 and 18 under 35 U.S.C. 103(a) as being unpatentable over 
Takebayashi et al. in view of Cornelison; and of claims 21 and 22 under 35 U.S.C. 
103(a) as being unpatentable over Takebayashi et al. in view of Richards, are proper. 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Martin Lerner whose telephone number is (703) 308- 
9064. The examiner can normally be reached on 8:30 AM to 6:00 PM Monday to 
Thursday. .. - _ 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (703) 305-9645. The fax phone 
number for the organization where this application or proceeding is assigned is 703- 
872-9306. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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